Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] Add upgrade path from stable/prometheus-operator #119

Conversation

fktkrt
Copy link
Contributor

@fktkrt fktkrt commented Sep 19, 2020

What this PR does / why we need it:

I had to migrate our stable/prometheus-operator to the new chart while keeping the existing PersistentVolume.
To do this I had to perform the steps provided below. After the procedure I was able to access the old time-series data stored in the PV.

Which issue this PR fixes

Special notes for your reviewer:

This guide covers a fresh installation of the new chart, so not persisted configurations (e.g. in values.yaml, custom dashboards) can be lost after the process is finished.

Checklist

  • DCO signed
  • Chart Version bumped
  • Title of the PR starts with chart name (e.g. [prometheus-couchdb-exporter])

@fktkrt fktkrt force-pushed the add-prometheus-operator-migration-guide branch from d2a10a6 to de00619 Compare September 19, 2020 22:39
@gkarthiks gkarthiks changed the title [prometheus-couchdb-exporter] Add upgrade path from stable/prometheus-operator [kube-prometheus-stack] Add upgrade path from stable/prometheus-operator Sep 20, 2020
@fktkrt fktkrt force-pushed the add-prometheus-operator-migration-guide branch from 409d910 to 09cfc5f Compare September 20, 2020 09:31
Copy link
Member

@gkarthiks gkarthiks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, let wait for one more reviewer.

@scottrigby
Copy link
Member

Thanks 😊 This is great if a user needs to upgrade the release name or namespace.

But given that users can do a full name override to match their existing release, would you be willing to add a similar scenario for retaining PVs for an in-place upgrade?

Also is it clear to everyone what data is being persisted here, and how exactly to do that? Users can persist the grafana subchart user account and dashboard data as well, right?

@scottrigby
Copy link
Member

Alternatively, we could merge this as-is, and allow other members of the community to document how to do so for an in-place upgrade. WDYT?

@gkarthiks
Copy link
Member

@scottrigby I like your second option. Merge this as-is and support the @fktkrt for his contribution and others (or may be @fktkrt would be interested as well) might be interested in contributing the other documentation.

@scottrigby
Copy link
Member

scottrigby commented Sep 21, 2020

Before this is merged, let's have someone try these steps to ensure they work properly. I'm not sure when I'll get to that but once that's verified by at least one other person we can merge this.

I created #121 to make sure we don't forget to address scenarios and details I asked about above.

@fktkrt fktkrt force-pushed the add-prometheus-operator-migration-guide branch from 91eeb1c to 080e5d6 Compare September 21, 2020 10:16
@fktkrt
Copy link
Contributor Author

fktkrt commented Sep 21, 2020

Just performed it in another cluster, added a missing step and cleaned up the second patch syntax.

@fktkrt
Copy link
Contributor Author

fktkrt commented Sep 29, 2020

@m42u Please see my recent changes to this PR based on your comments.
I've omitted deleting the endpoint, because that should be taken care of when removing the service itself.

... there's a volumeName under prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec that might help, but this needs validation.

@desaintmartin, there's this option that might solves your comment, any help is welcomed to help validate it as I have limited time at the moment.

@flouthoc
Copy link

@fktkrt Path requires me to uninstall previous chart release wouldn't this cause any downtimes ?

@fktkrt
Copy link
Contributor Author

fktkrt commented Sep 30, 2020

@fktkrt Path requires me to uninstall previous chart release wouldn't this cause any downtimes ?

@flouthoc Yes, from to point the old is removed and until the new is installed (1-2m), you won't have metrics.
If you are seeking a zero-downtime way to upgrade, then helm'sfullnameOverride option might be a good fit as others have pointed out, although I don't have any personal experience with it (in terms of PV/PVCs at least).

Do you maybe want to give it a try? :)

@flouthoc
Copy link

flouthoc commented Oct 1, 2020

@fktkrt Thanks i guess 1-2m wont do any damage. I am performing a drill on dummy cluster to record delta for downtime, I guess after that we are good to go. Any reason why is this guide not merged to master yet ?

@fktkrt
Copy link
Contributor Author

fktkrt commented Oct 3, 2020

@fktkrt Thanks i guess 1-2m wont do any damage. I am performing a drill on dummy cluster to record delta for downtime, I guess after that we are good to go. Any reason why is this guide not merged to master yet ?

@scottrigby said that first let's wait for a few more successful migration report then we can merge this.
So @flouthoc, can you report one more for us? :)

@shaikatz
Copy link

shaikatz commented Oct 7, 2020

I just did a migration from JSONNET version of kube-prometheus to the helm chart, using those instructions more or less, and it worked perfectly, I think that should get merged :-)

@fktkrt fktkrt force-pushed the add-prometheus-operator-migration-guide branch 4 times, most recently from 856abac to 2ec1163 Compare October 8, 2020 09:27
@scottrigby
Copy link
Member

We have some good user validation of the steps, so let's merge this. It just needs conflicts to be resolved.

@scottrigby
Copy link
Member

I have resolved the conflict

scottrigby
scottrigby previously approved these changes Oct 9, 2020
@scottrigby
Copy link
Member

Fixed markdownlint on your branch @fktkrt

@scottrigby scottrigby merged commit a72459c into prometheus-community:main Oct 9, 2020
@llamahunter
Copy link

The prometheus.prometheusSpec.serviceMonitorSelector changes with the new chart from release: prometheus-operator to release: kube-prometheus-stack. If you have a lot of ServiceMonitor objects around, do you then need to go through and manually update all your existing ServiceMonitor objects? Is there any way to make it continue to use the old selector? Looking at the helm chart, it didn't seem possible, as the Release.Name is hard coded into the ServiceMonitor objects created by kube-prometheus-stack.

@fktkrt
Copy link
Contributor Author

fktkrt commented Oct 16, 2020

I had no issues regarding this with the following setup:

I am using ServiceMonitors and PodMonitors, and labeling those this way:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: foo-podmonitor
spec:
  selector:
    matchLabels:
      app: foo

This is my prometheusSpec:

prometheus:
  prometheusSpec:
    ruleSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false

additionally, for my custom environments I am overriding these with externalLabels like this:

prometheus.prometheusSpec.externalLabels.prometheus=dev-prometheus

My Prometheus instance has default labels:

Labels:       app=prometheus
              controller-revision-hash=prometheus-kube-prometheus-stack-prometheus-677954c66f
              prometheus=kube-prometheus-stack-prometheus
              statefulset.kubernetes.io/pod-name=prometheus-kube-prometheus-stack-prometheus-0

and the new release can continue scraping all my existing Pod/ServiceMonitors created with the legacy chart.

If you have access to a test cluster (or can deploy a playground with k3s or similar), you can easily doublecheck it, just to make sure nothing breaks.

@llamahunter
Copy link

By turning off the rule/serviceMonitor/podMonitor selectors, doesn't that make it difficult to tell an instance of prometheus to only monitor things that are related to that instance? What if you have multiple prometheuses (promethei?) running in your cluster?

@fktkrt
Copy link
Contributor Author

fktkrt commented Oct 16, 2020

You can use additional labels to tie apps to specific prometheus instances.
Another idea that just came up is that if you're using additional charts to manage the CRDs, or you can solve this by modifying only your template of your CRD generator chart.

(By the way, you were close with promethei, https://prometheus.io/docs/introduction/faq/#what-is-the-plural-of-prometheus :) )

stamzid pushed a commit to Unstructured-IO/prometheus-community-helm-charts that referenced this pull request Mar 3, 2023
…tor (prometheus-community#119)

* add upgrade path from stable/prometheus-operator

Signed-off-by: fktkrt <[email protected]>

* bump chart

Signed-off-by: fktkrt <[email protected]>

* fix lint

Signed-off-by: fktkrt <[email protected]>

* fix lint

Signed-off-by: fktkrt <[email protected]>

* generalize object references

Signed-off-by: fktkrt <[email protected]>

* fix kubectl patch syntax, add missing step

Signed-off-by: fktkrt <[email protected]>

* bump Chart version

Signed-off-by: fktkrt <[email protected]>

* add step to remove legacy kubelet service

Signed-off-by: fktkrt <[email protected]>

* add instructions to specify AZ trough labels

Signed-off-by: fktkrt <[email protected]>

* remove unnecessary namespace reference, mention CRD provisioning, add missing zone identifier

Signed-off-by: fktkrt <[email protected]>

* add volumeClaimTemplate example

Signed-off-by: fktkrt <[email protected]>

* Fix markdownlint. Also change 'bash' to appropriate linguist codeblock option

Signed-off-by: Scott Rigby <[email protected]>

Co-authored-by: Scott Rigby <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[kube-prometheus-stack] Can't find upgrade procedure from stable/prometheus-operator